Improvements on Speech Recogniton for Fast Talkers

نویسندگان

  • M. Richardson
  • M. Hwang
  • A. Acero
چکیده

The accuracy of a speech recognition (SR) system depends on many factors, such as the presence of background noise, mismatches in microphone and language models, variations in speaker, accent and even speaking rates. In addition to fast speakers, even normal speakers will tend to speak faster when using a speech recognition system in order to get higher throughput. Unfortunately, state-of-the-art SR systems perform significantly worse on fast speech. In this paper, we present our efforts in making our system more robust to fast speech. We propose cepstrum length normalization, applied to the incoming testing utterances, which results in a 13% word error rate reduction on an independent evaluation corpus. Moreover, this improvement is additive to the contribution of Maximum Likelihood Linear Regression (MLLR) adaptation. Together with MLLR, a 23% error rate reduction was achieved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Intelligibility of clear and conversational speech of young and elderly talkers.

It has been documented that talkers can be trained to produce "clear" speech, which is significantly more intelligible for hearing-impaired listeners. In this study, the ability of both younger and older talkers to produce clear speech after a minimal amount of instruction and practice was investigated. Tape recordings were made with the talkers attempting to produce both conversational-style a...

متن کامل

Which Phoneme-to-Viseme Maps Best Improve Visual-Only Computer Lip-Reading?

A critical assumption of all current visual speech recognition systems is that there are visual speech units called visemes which can be mapped to units of acoustic speech, the phonemes. Despite there being a number of published maps it is infrequent to see the effectiveness of these tested, particularly on visual-only lip-reading (many works use audio-visual speech). Here we examine 120 mappin...

متن کامل

Investigating the Narrative Skills of Late Talkers Through Sequential Picture Stories

Objectives: The purpose of the present study is to investigate the oral narrative skills of late talkers mostly caused by mental disorders while they try to comprehend a wordless sequential picture story to create and narrate the relevant story. Methods: To this end, 15 (10 male and 5 female) participants were who were the students of a specialized school for physically and mentally retarded...

متن کامل

Effectiveness of computer-based auditory training in improving the perception of noise-vocoded speech.

Five experiments were designed to evaluate the effectiveness of "high-variability" lexical training in improving the ability of normal-hearing subjects to perceive noise-vocoded speech that had been spectrally shifted to simulate tonotopic misalignment. Two approaches to training were implemented. One training approach required subjects to recognize isolated words, while the other training appr...

متن کامل

TALKER BACKGROUND AND INDIVIDUAL DIFFERENCES IN THE SPEECH INTELLIGIBILITY BENEFIT by

One way talkers can increase intelligibility is by producing clear speech. Though clear speech, as opposed to conversational speech (ConvS), generally increases intelligibility (known as the clear speech intelligibility benefit), not all talkers exhibit the same degree of benefit. Ferguson showed that while intelligibility increased across talkers for clear speech, when looking at individual ta...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999